163 research outputs found

    Character-Word LSTM Language Models

    Full text link
    We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By concatenating word and character embeddings, we achieve up to 2.77% relative improvement on English compared to a baseline model with a similar amount of parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level models with a larger number of parameters

    The effect of word similarity on N-gram language models in Northern and Southern Dutch

    Get PDF
    In this paper we examine several combinations of classical N-gram language models with more advanced and well known techniques based on word similarity such as cache models and Latent Semantic Analysis. We compare the efficiency of these combined models to a model that combines N-grams with the recently proposed, state-of-the-art neural network-based continuous skip-gram. We discuss the strengths and weaknesses of each of these models, based on their predictive power of the Dutch language and find that a linear interpolation of a 3-gram, a cache model and a continuous skip-gram is capable of reducing perplexity by up to 18.63%, compared to a 3-gram baseline. This is three times the reduction achieved with a 5-gram. In addition, we investigate whether and in what way the effect of Southern Dutch training material on these combined models differs when evaluated on Northern and Southern Dutch material. Experiments on Dutch newspaper and magazine material suggest that N-grams are mostly influenced by the register and not so much by the language (variety) of the training material. Word similarity models on the other hand seem to perform best when they are trained on material in the same language (variety)

    A Comparison of Different Punctuation Prediction Approaches in a Translation Context

    Get PDF
    We test a series of techniques to predict punctuation and its effect on machine translation (MT) quality. Several techniques for punctuation prediction are compared: language modeling techniques, such as n-grams and long short-term memories (LSTM), sequence labeling LSTMs (unidirectional and bidirectional), and monolingual phrase-based, hierarchical and neural MT. For actual translation, phrase-based, hierarchical and neural MT are investigated. We observe that for punctuation prediction, phrase-based statistical MT and neural MT reach similar results, and are best used as a preprocessing step which is followed by neural MT to perform the actual translation. Implicit punctuation insertion by a dedicated neural MT system, trained on unpunctuated source and punctuated target, yields similar results.This research was done in the context of the SCATE project, funded by the Flemish Agency for Innovation and Entrepreneurship (IWT project 13007)

    Fermentable soluble fibres spare amino acids in healthy dogs fed a low-protein diet

    Get PDF
    Background: Research in cats has shown that increased fermentation-derived propionic acid and its metabolites can be used as alternative substrates for gluconeogenesis, thus sparing amino acids for other purposes. This amino acid sparing effect could be of particular interest in patients with kidney or liver disease, where this could reduce the kidneys'/liver's burden of N-waste removal. Since dogs are known to have a different metabolism than the obligatory carnivorous cat, the main objective of this study was to assess the possibility of altering amino acid metabolism through intestinal fermentation in healthy dogs. This was studied by supplementing a low-protein diet with fermentable fibres, hereby providing an initial model for future studies in dogs suffering from renal/liver disease. Results: Eight healthy dogs were randomly assigned to one of two treatment groups: sugar beet pulp and guar gum mix (SF: soluble fibre, estimated to mainly stimulate propionic acid production) or cellulose (IF: insoluble fibre). Treatments were incorporated into a low-protein (17 %) extruded dry diet in amounts to obtain similar total dietary fibre (TDF) contents for both diets (9.4 % and 8.2 % for the SF and IF diet, respectively) and were tested in a 4-week crossover feeding trial. Apparent faecal nitrogen digestibility and post-prandial fermentation metabolites in faeces and plasma were evaluated. Dogs fed the SF diet showed significantly higher faecal excretion of acetic and propionic acid, resulting in a higher total SCFA excretion compared to IF. SF affected the three to six-hour postprandial plasma acylcarnitine profile by significantly increasing AUC of acetyl-, propionyl-, butyryl- + isobutyryl-, 3-OH-butyryl-, 3-OH-isovaleryl- and malonyl-L-carnitine. Moreover, the amino acid plasma profile at that time was modified as leucine + isoleucine concentrations were significantly increased by SF, and a similar trend for phenylalanine and tyrosine's AUC was found. Conclusion: These results indicate that guar gum and sugar beet pulp supplementation diminishes postprandial use of amino acids favoring instead the use of short-chain fatty acids as substrate for the tricarboxylic acid (TCA) cycle. Further research is warranted to investigate the amino acid sparing effect of fermentable fibres in dogs with kidney/liver disease

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
    • …
    corecore